Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Improving OCR Accuracy for Classical Critical Editions

Identifieur interne : 000962 ( Main/Exploration ); précédent : 000961; suivant : 000963

Improving OCR Accuracy for Classical Critical Editions

Auteurs : Federico Boschetti [États-Unis] ; Matteo Romanello [États-Unis] ; Alison Babeu [États-Unis] ; David Bamman [États-Unis] ; Gregory Crane [États-Unis]

Source :

RBID : ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF

Abstract

Abstract: This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.

Url:
DOI: 10.1007/978-3-642-04346-8_17


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Improving OCR Accuracy for Classical Critical Editions</title>
<author>
<name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
</author>
<author>
<name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
</author>
<author>
<name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
</author>
<author>
<name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
</author>
<author>
<name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-04346-8_17</idno>
<idno type="url">https://api.istex.fr/document/E139A13B4800B4F0FC4DA869252849D648DB14FF/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000008</idno>
<idno type="wicri:Area/Istex/Curation">000008</idno>
<idno type="wicri:Area/Istex/Checkpoint">000484</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Boschetti F:improving:ocr:accuracy</idno>
<idno type="wicri:Area/Main/Merge">000970</idno>
<idno type="wicri:Area/Main/Curation">000962</idno>
<idno type="wicri:Area/Main/Exploration">000962</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Improving OCR Accuracy for Classical Critical Editions</title>
<author>
<name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E139A13B4800B4F0FC4DA869252849D648DB14FF</idno>
<idno type="DOI">10.1007/978-3-642-04346-8_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Massachusetts</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
</region>
<name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
<name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
<name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
<name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000962 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000962 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF
   |texte=   Improving OCR Accuracy for Classical Critical Editions
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024